Automatic Contextual Text Correction using the Linguistic habits Graph Lhg

نویسندگان

  • Marcin Gadamer
  • Adrian Horzyk
چکیده

Automatyczna korekta tekstów stanowi ważny problem z punktu widzenia dzisiejszych procesorów i edytorów tekstów. W tym artykule został przedstawiony innowacyjny algorytm służący do automatyzacji kontekstowej korekty tekstów z wykorzystaniem Grafu Przyzwyczajeń Lingwistycznych (LHG), który również opisano w tym artykule. W tym celu zbudowano specjalistycznego pająka internetowego przeszukującego strony internetowe celem skonstruowania Grafu Przyzwyczajeń Lingwistycznych (LHG) na podstawie analizy korpusów tekstów uzyskanych z polskojęzycznych stron internetowych. Otrzymane wyniki korekty tekstu z wykorzystaniem tego algorytmu, bazującego na grafie LHG, zostały porównane z komercyjnymi programami do korekty tekstu takimi jak Microsoft Word 2007, Open Office Writer 3.0 oraz z wyszukiwarką Google. Otrzymane wyniki korekty tekstów okazały się być znacznie lepsze niż w wyżej wymienionych komercyjnych narzędziach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Impact of Contextual Clue Selection on Inference

Linguistic information can be conveyed in the form of speech and written text, but it is the content of the message that is ultimately essential for higher-level processes in language comprehension, such as making inferences and associations between text information and knowledge about the world. Linguistically, inference is the shovel that allows receivers to dig meaning out from the text with...

متن کامل

Emotion Detection in Persian Text; A Machine Learning Model

This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...

متن کامل

Cipher text only attack on speech time scrambling systems using correction of audio spectrogram

Recently permutation multimedia ciphers were broken in a chosen-plaintext scenario. That attack models a very resourceful adversary which may not always be the case. To show insecurity of these ciphers, we present a cipher-text only attack on speech permutation ciphers. We show inherent redundancies of speech can pave the path for a successful cipher-text only attack. To that end, regularities ...

متن کامل

Towards Contextual Healthiness Classification of Food Items - A Linguistic Approach

We explore the feasibility of contextual healthiness classification of food items. We present a detailed analysis of the linguistic phenomena that need to be taken into consideration for this task based on a specially annotated corpus extracted from web forum entries. For automatic classification, we compare a supervised classifier and rule-based classification. Beyond linguistically motivated ...

متن کامل

Multi-level post-processing for Korean character recognition using morphological analysis and linguistic evaluation

Most of the post-processing methods for character recognition rely on contextual information of character and word-fragment levels. However, due to linguistic characteristics of Korean, such low-level information alone is not sufficient for high-quality character-recognition applications, and we need much higher-level contextual information to improve the recognition results. This paper present...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Science (AGH)

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2009